perm filename EVALU[DIS,DBL]6 blob
sn#219341 filedate 1976-06-14 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00020 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .NSECP(Evaluating AM)
C00006 00003 .SSEC(Judging Performance)
C00012 00004 . SSSEC(AM's Ultimate Discoveries)
C00017 00005 . SSSEC(The Magnitude of AM's Progress)
C00021 00006 . SSSEC(The Quality of AM's Route)
C00034 00007 . SSSEC(The Character of the User-System Interactions)
C00041 00008 . SSSEC(AM's Intuitive Powers)
C00046 00009 . SSSEC(Experiments on AM)
C00050 00010 . SSSEC(How to Perform Experiments on AM)
C00055 00011 . SSSEC(Future Implications of this Project)
C00065 00012 . SSSEC(Comparison to Other Systems)
C00079 00013 .SSEC(Capabilities and Limitations of AM)
C00080 00014 . SSSEC(Current Abilities)
C00088 00015 . SSSEC(Current Limitations)
C00095 00016 . SSSEC(Limitations of the Agenda scheme)
C00105 00017 . SSSEC(Choice of Domain)
C00113 00018 . SSSEC(Limitations of the Model of Math Research)
C00119 00019 . SSSEC(Ultimate powers and weaknesses)
C00123 00020 .SSEC(Final Conclusions)
C00125 ENDMK
C⊗;
.NSECP(Evaluating AM)
.TURN ON "{}"
This chapter contains discussions "meta" to AM itself.
First comes an essay about judging the performance of a system like AM.
This is a very hard task, since AM has no "goal". Even using current mathematical
standards, should AM be judged on what it produced, or the quality of the
path which led to those resuls, or the difference between what it started with
and what it finally derived?
Section {SECNUM}.2 deals with the capabilities and limitations of AM.
.B48
⊗8# ⊗* What concepts can be elicited from AM now?
With a little tuning/tiny additions?
⊗8# ⊗* What are some notable omissions in AM's behavior? Can the user elicit these?
⊗8# ⊗*
What could proabably be done within a couple months of modifications?
⊗8# ⊗*
Aside from a total change of domain, what kinds of activities does AM lack
(e.g., proof capabilitites)? Are any discoveries (e.g., analytic function theory)
clearly beyond its design limitations?
.E
MAYBE:
Finally, all the conclusions will be gathered together, and a short summary of
this project will be tolerated.
.TURN OFF "{}"
.SSEC(Judging Performance)
One may view AM's activity as a progression from an initial core of knowledge
to a more sophisticated "final"$$ As has been stressed before, AM has no
fixed goal, no "final" state. For practical purposes, however, the totality of
explorations by AM is about the same as the "best run so far"; either of these can be
thought of as defining what is meant by the "final" state of knowledge. $
body of concepts and their facets.
Then each of the following is a reasonable way to measure success, to "judge" AM:
.BN SKIP 1
λλ By AM's ultimate achievements. Examine the list of
concepts and methods AM developed.
Did AM ever discover anything interesting yet unknown to the user?$$
The "user" is a human who works with AM interactively, giving it hints, commands,
questions, etc.
Notice that by "new" we mean new to the user, not new to Mankind.
This might occur if the user were a child, and AM discovered
some elementary facts of arithmetic.
This is not really
so provincial: mathematicians take "new" to mean new to Mankind, not
new in the Universe. I feel philosophy slipping in, so this footnote is
terminated. $ Anything new to Mankind?
λλ By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.
λλ By the quality of the route AM took to accomplish these advances:
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?
.BREAK
λλ By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide AM? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to AM,
and does AM respond well to it?
Given a reasonable kick in the right direction, can AM develop the mini-theories
which the user intended, or at least something equally interesting?
λλ By its intuitive heuristic powers: Does AM believe in "reasonable" conjectures?
How accurately does AM estimate the difficulty of tasks it
is considering?
Does AM tie together (e.g., as analogous) concepts which are formally unrelated
yet which benefit from such a tie?
λλ By the results of the experiments described in
Section {[2] EXPT}.{[2] EXPTSSEC}, page {[3] EXPTPAGE}.
How fragile is the worth numbering scheme? The priority of tasks scheme?
How domain-specific are those heuristics really? The set of facets?
λλ By the fact that the kinds of experiments outlined in the next section can
easily be "set up" and performed on AM.
Regardless of the experiments' outcomes,
the features of AM which allow them to be carried
out at all are worthy of note.
λλ By the implications of this project. What can AM suggest about educating
young mathematicians (and scientists in general)?
What can AM say about doing math (about empirical research in general)?
λλ By comparisons to other, similar systems.
.E
.ONCE TURN ON "{}"
For each of these {BNN} measuring criteria,
a subsection will now be provided, to illustrate (i) a
stunning acheivement and (ii) a stunning failure of AM along each dimension, and
(iii) to
try to objectively characterize AM's performance according to that measure.
. SSSEC(AM's Ultimate Discoveries)
Two of the ideas which AM Proposed were totally new to the user:
.MAXDIVPAGE: PAGE
.MAXDIVSEC: SECNUM
.MAXDIVSSEC: SSECNUM
.BN
λλ Consider numbers with an abnormally high number of divisors.
If d(n) represents the number of divisors of n,$$e.g., d(12) = ||α{1,2,3,4,6,12α}||
= 6. $ then AM defines the set of "maximally-divisible numbers" to be
⊗6α{nεN | (∀m<n) d(m)<d(n)α}⊗*. By factoring each such number into primes,
AM noticed a regularity in them. The author then developed a "mini-theory" about
these numbers. It later turned out that Ramanujan had already proposed that very
same definition (in 1915), and had found that same regularity. His results only partially
overlap those of AM and the author, however.
λλ
⊗1AM found a cute geometric application of Goldbach's conjecture. Given a set of
all angles of prime degree, from 0 to 180↑o,$$Included are 0↑o and 1↑o, as well
as the "typical" primes 2↑o, 3↑o, 5↑o, 7↑o, 11↑o,..., 179↑o. $ then ⊗4any⊗* angle
between 0 and 180 degrees can be approximated to within 1↑o by adding a pair of
angles from this prime set.
In fact, it is hard to find smaller sets than this one which approximate any
angle to that accuracy.
.E
By and large, the other concepts which AM developed were either already-known,
or real losers. Here is one example of a real turkey:
AM composed Set-insert with the predicate Equality (i.e., absolute equality
of two sets). The result was an operation Insert-o-Equal(x,y,z), which
first tested whether x was Equal to y or not. The result of this was either
T or F$$ Actually, in LISP, it was easier to have such results
always be either T or NIL$. Next, this result was inserted into z.
For example, Insert-o-Equal({1,2},{3,4},{5,6})={F,5,6}. The first two
arguments are not equal, so F was inserted into the third.
This is a real loser of an operation, if ever there was one.
Another kind of loser occurred whenever AM entered upon some "regular" behavior.
For example, if it decided that Compose was interesting, it might try to create
some examples of compositions. It could do this by picking two operations and
composing them. What better operations to pick than Compose and Compose!
Thus Compose-o-Compose would be born. By composing that with itself, an even
more monstrous operation is spawned: Compose-o-Compose-o-Compose-o-Compose.
Since AM actually uses the word "Compose" instead of that little infix circle, the
PNAME of the data structure it creates is horrendous. Its use is almost
nonexistent: it must take 5 operations as arguments, and it returns a new
operation which is the composition of those five.
In summary, then, we may say that AM produced a few ideas
new to the author,
a couple of which
were new to Mankind (although of questionable interest!). Several additional
"new" concepts were created which both AM and the user agreed were better forgotten.
The "level" of AM's fruits
could be classified as a high-school student, although this is
deceptive since AM lacks the ⊗4breadth⊗* of abilities any human being possesses.
. SSSEC(The Magnitude of AM's Progress)
We can ask the following kind of question: how many "levels" did AM progress
along? This is a fuzzy notion, but basically we shall say that a new level is
reached when a valuable new
bunch of connected concepts are
defined in terms of concepts at a lower level.
For example, AM started out knowing about Sets and Set-operations.
When it progressed
to numbers and arithmetic, that was one big step up to a new level. When it
zeroed in on primes, unique-factorization, and divisibility, it had moved up another
level.
When fed simple geometry concepts, AM moved up one level when it defined some
generalizations of the equality of geometric figures (parallel lines,
congruent and similar triangles, angles equal in measure) and their invariants
(rotations, translations, reflections).
The above few examples are unfortunately exhaustive: that just about sums up
the major advances AM made.
Its progress was halted not so much by cpu time and space, as by a paucity of
meta-knowledge: heurstic rules for filling in new heuristic rules.
Thus AM's successes are finite, and its failures infinite, along this dimension.
A more charitable view might compare AM to a human who was forced to start from
set theory, with AM's sparse abilities. In that sense, perhaps, AM would rate
quite well. The "unfair" advantage it had was the presence of many heuristics
which themselves were gleaned from mathematicians: i.e., they are like compiled
hindsight. A major purpose of mathematics education in the university is to
instill these heuristics into the minds of the students.
AM is thus characterized as possessing heuristics which are powerful enough
to take it a few "levels" away from the kind of knowledge it began with,
but ⊗4only⊗* a few levels. The limiting factors are (i) the heuristic rules
AM begins with, and more specifically (ii) the expertise in recognizing and
compiling new heuristics, and more generally (iii) a lack of real-world
situations to draw upon for analogies, intuitions, and applications.
. SSSEC(The Quality of AM's Route)
No matter what its acheivements were, or the magnitude of its
advancement from initial knowledge, AM could still be judged
"unintelligent" if, e.g., it were exploring vast numbers of absurd
avenues for each worthwhile one it found. The quality of the route AM
followed is thus quite significant.
AM performed better in this respect than expected, although much
worse than a real mathematician would have done.
Of the hundred or
so new concepts it defined, at least fifty were acceptable -- in the
sense that one can defend AM's reasoning in at least exploring
them; in the sense that a human mathematician might have considered them. Of these
fifty "winners", about a dozen were significant -- that is, useful,
catalytic, well-known by human mathematicians, etc. Unfortunately,
the fifty concepts which were losers were ⊗4real⊗* losers. In this
respect, AM fell far below the standards a mathematician would set
for acceptable behavior: ⊗4all⊗* his failures should have at least
seemed promising at the beginning. Half of AM's adventures were
poorly grounded, and (perhaps due to a lack of intuition) AM bothered
with concepts which were "obviously" trivial: the set of even primes,
the set of numbers with only one divisor, etc.
The human mathematician might momentarily consider many poor courses of action,
but he would only spend a significant amount of time on very promising ones.
AM on the other hand considered very few lunatic activities, but wasted
much time on tasks which a human would have recognized as dead-ends.
Once again we must observe that the quality of the route is a
function of the quality of the heuristics. If there are many clever
little rules, then the steps AM takes will often seem clever and
sophisticated. If the rules superimpose nicely, joining together to
collectively buttress some specific activity, then they may even seem
surprisingly effective to their creator.
Such moments of great insight (i.e., where AM's reasoning surpassed
mine) did occur, although rarely. Both of AM's "biggie discoveries" started
by its examining concepts I felt weren't really interesting. For
example, I didn't like AM spending so much time worrying about
numbers with many divisors; I "knew" that the converse concept of
primes was infinitely more valuable. And yet AM saw no reason to give
up on maximally-divisible numbers; it had several good reasons for
continuing that inquiry (they were the converse to primes which had
already proved interesting, their frequency within the integers was
neither very high nor very low nor very regular, their definition was
simple, they were extremals of the interesting operation
"Divisors-of", etc., etc.) Similarly, I "knew" that Goldbach's
conjecture was useless, so I was unhappy that AM was bothering to try
to apply it in the domain of geometry. In both cases, AM's reasons
for its actions were unassailable, and in fact it did discover some
interesting new ideas both times.
Sometimes AM's behavior was displeasing, even though it wasn't
"erring". Occasionally it was simultaneously developing two
mini-theories (say primes and maximally-divisibles). Then it might
pick a task or two dealing with one of these topics, then a task or
two dealing with the other topic, etc. The task picked at each moment
would be the one with the highest priority value. As a theory is
developed, the interestingness of its associated tasks go up and
down; there may be doldrums for a bit, just before falling into the
track that will lead to the discovery of a valuable relationship.
During these temporary lags, the interest value of tasks related to
the other theory's concepts will appear to have a higher priorty value: i.e.,
better reasons supporting it. So AM would then skip over to one of ⊗4those⊗*
concepts, develop it until ⊗4its⊗* doldrums, then return to the first
one, etc. Most humans found this behavior unpalatable$$ Although it might
be the "best" from a dynamic management point of view, it probably
would be wrong in the long run. Major advances really do have lulls
in their development. $ because AM had no compunction about skipping
from one topic to another. Humans have to retune their minds to do
this skipping, and therefore treat it much more seriously. For that
reason, AM was given an extra mobile reason to use for certain tasks
on its agenda: "focus of attention". Any task with the same kind of
topic as the ones just executed are given this extra reason, and it
raises their priority values a little. This was enough sometimes to
keep AM working on a certain mini-theory when it otherwise would have
skipped somewhere else.
The above "defect" is a cute little kind of behavior AM exhibited
which was non-human but not clearly "wrong". There were ⊗4genuine⊗*
bad moments also, of course. For example, AM became very excited$$
This anthropomorphism must be excused. Technically, we may say that
the priority value of the best job on the agenda is the "level of
excitement" of AM. 1000 would be ectasy, 100 boredom, and 0 sensory
deprivation. $ when the conjunction of "empty-set" and other concepts
kept being equal to empty-set. AM kept repeating conjunctions of this
form, rather than stepping back and generalizing this data into a
(phenomenological) conjecture. Similar blind looping behavior
occurred when AM kept composing Compose with itself, over and over.
In general, one could say that "regular" behavior of any kind signals
a probable fiasco. A heuristic rule to this effect halted most of
these disgraceful antics. This rule had to be careful, since it was
almost the antithesis of the "focus of attention" idea mentioned in
the preceding paragraph. Together, those two rules seem to say that
you should continue on with the kind of thing you were just doing,
but not for ⊗4too⊗* long a time.
The moments of insight were 2 in number; the moments of stupid
misdirection were about twenty times as many.
AM had very few heuristics for deciding that something was
⊗4un⊗*interesting, that work on it should halt for a long time.
Rather, AM simply would not have anything positive to say about that
concept, and other concepts would be explored instead, essentially by
default. Each concept had a worth component which corresponded to
its right to life (its right to occupy storage in core). This number
slowly declines with time, and is raised whenever something
interesting happens with that concept. If it ever falls below a
certain threshhold, and if space is exhausted$$
No concepts were forgotten in this way until near the end of AM's runs,
when AM would usually collapse from several causes including lack of space.
$, then
the concept is forgotten: its list cells are
garbage collected, and all references to it are erased, save those
which will keep it from being re-created. This again is not
purposeful forgetting, but rather by default; not because X is seen as
a dead-end, but simply because other concepts seem so much more
interesting for a long time.
Thus AM did not develop the fifty "losers" very much: they ended up
with an average of only 1.5 tasks relevant to them ever having been
chosen. The "winners" averaged about twice as many tasks which
helped fill them out more. Also, the worth ratings of the losers
were far below those of the winners. So AM really did judge the
value of its new concepts quite well.
The final aspect of this important dimension of evaluation is the
quality of the reasons AM used to support each task it chose to work
on. Again, the English phrases corresponded quite nicely to the
"real" reasons a human might give to justify why something was worth
trying, and the ordering of the tasks on the agenda was rarely far
off from the one that I would have picked myself. This was perhaps
AM's greatest success: the rationality of its actions.
. SSSEC(The Character of the User-System Interactions)
AM is not a "user-oriented" system. There were many nice human-interaction
features in
the original grandiose
proposal for AM which never got off the drawing board. At the
heart of these features were two assumptions:
.BN
λλ The user must understand AM, and AM must likewise have a good
model of the particular human using him. The only time either should
initiate a message is when his model of the other is not what he
wants that model to be. In that case, the message should be
specifically designed to fix that discrepancy.$$ This idea was motivated
by a lecture given in 1975 by Terry Winograd$
λλ Each kind of message which is to pass between AM and its user
should have its own appropriate language. Thus there should be a
terse comment language, whereby the user can note how he feels about
what AM is doing, a questioning language for either party to get/give
reasons to the other, a picture language for communicating certain
relationships, etc.
.E
Neither of these ideas ever made it into the LISP code that is now
AM, although they are certainly not prohibited in any way by AM's
design. It would be a separate project, at the level of a master's
thesis I expect, for someone to build a nice user interface for AM$$
I am not actually calling for this to be done, merely indicating the
magnitude of the effort involved. A VERY nice user interface might be
much harder, at the level of a dissertation. $.
As one might expect, the reason for this atrophy is simply because very
little guidance from the user was needed by AM. In fact, all the
discoveries, cpu time quotes, etc. mentioned in this document are
taken from totally unguided runs by AM. If the user guides as well as
he can, then about a factor of 3 speedup is possible. Of course, this
assumes that the user is dragging AM directly along a line of
development he knows will be successful. The user's "reasons" at each
step are based essentially on hindsight. Thus this is not at all
"fair". If AM ever becomes more user-oriented, it would be nice to
let children (say 6-12 years old)
experiment with it, to observe them working with it in
domains unfamiliar to either of them.
.ONCE TURN ON "{}"
The user can "kick" AM in one direction or another, e.g., by
interrupting and telling AM that Sets are more interesting than
Numbers$$ To actually do this, the user will type control-I to
interrupt AM. He then types I, meaning "alter the interest of",
followed by the word "Sets". AM then asks whether this is to be
raised or lowered. He types back R, and AM asks how much, on a 1-10
scale. He replies 9, say, and then repeats this process for the
concept "Numbers". $. In that particular case, AM fails to develop
any higher-level set concepts (diagonalization, infinite sets, etc.)
and simply wallows around in finite set theory (de Morgan's laws,
associativity of Union, etc.). When geometric concepts are input,
and AM is kicked in ⊗4that⊗* direction, much nicer results are
obtained. See Experiment 5, page {[3] GEOEX}.
<<Should we include mention of the verbosity level and expertise-level features?>
<<Include an evaluation of the human engineering features (and humans' reactions)?>
There is one important result to observe: the very best examples of
AM in action were brought to full fruition only by a human developer.
That is, AM thought of a couple great concepts, but couldn't develop them
well on its own. A human (the author) then took them and worked on them
by hand, and interesting results were achieved. These results could be
told to AM, who could then go off and look for new concepts to investigate.
This interaction is of course at a much lower frequency than the kind
of rapidfire question/answering talked about above. Yet it seems that
such synergy may be the ultimate mode of AM-like systems.
. SSSEC(AM's Intuitive Powers)
Let me hasten to mention that the word "intuitive" in this subsection's
title is not related to the (currently non-existent)
"Intuitions" facets of the concepts. What is meant is the totality of
plausible reasoning which AM engages in: empirical induction,
generalization, specialization, maintaining reasons for jobs on the
agenda list, creation of analogies between bunches of concepts, etc.
AM only considers conjectures which have been explicitly suggested:
either by empirical evidence, by analogy, or (de-implemented now:) by
Intuition facets. Once a conjecture has been formulated, it is tested in
all ways possible: new experimental evidence is sought (especially
extreme cases), it is examined formally$$
Currently, this is done in trivial ways. An open problem, which is under attack
now, is to add more powerful formal reasoning abilities to AM.
$to see if it follows from
already-known conjectures, etc.
Because of this grounding in plausibility, the only conjectures the
user ever sees (the ones AM is testing) are quite believable. If
they turn out to be false, both he and AM are surprised. For
example, both AM and the user were disappointed when nothing came out of the
concept of Uniquely-prime-addable numbers (positive integers which
can be represented as the sum of two primes in precisely one way).
Several conjectures were proposed via analogy with unique prime
factorization, but none of them held experimentally. Each of them
seemed worth investigating, to both the user and the system.
AM's estimates of the value of each task it attempts were often far
off from what hindsight proved their true values to be. Yet this was
not so different from the situation a real researcher faces, and it
made little difference on the discoveries and failures of the system.
AM occasionally mismanaged its resources due to errors in these
estimates. To correct for such erroneous prejudgments, heuristic
rules were permitted to dynamically alter the time/space quanta for
the current task. If some interesting new result turned up, then some
extra resources would be allotted. If certain heuristics failed, they
could reduce the time limits, so not as much total cpu time would be
wasted on this turkey.
An example of a nice conjecture is the unique factorization one. A
nice analogy was the one between angles and numbers (leading to the
application of Goldbach's conjecture). Another nice analogy was
between numbers and bags (and hence between bag-operations and what
we commonly call arithmetic operations).
Some poor analogies were considered, like the one between bags and
singleton-bags. The ramifications of this analogy were painfully
trivial$$ The bag-operations, applied to singletons, did not produce
singletons as their result: (x)∪(y) is (x,y) which is not a
singleton. Whether they did or not depended only on the equality or
inequality of the two arguments. There were many tiny conjectures
proposed which merely re-echoed this general conclusion. $.
. SSSEC(Experiments on AM)
.ONCE TURN ON "{}"
The experiments described in Section {[2] EXPT}.{[2] EXPTSSEC}, page
{[3] EXPTPAGE} provide some results relevant to the overall value of
the AM system. The reader should consult that section for details;
neither the experiments nor their results will be repeated here. A
few conclusions will be summarized, to show that AM fared well in
this dimension of evaluation.
The worth-numbering scheme for the concepts is fairly robust: even
when all the concepts's worths are initialized at the same value, the
performance of AM doesn't collapse, although it is noticably
degraded.
Certain mutilations of the priority-value scheme for the agenda will
cripple AM, but it can resist most of the small changes tried in
various experiments.
The heuristics are specific to their stated domain of applicability.
Thus when working in geometry, the Operation heuristics were just as
useful as they were when AM worked in elementary set theory or number
theory. The set of facets seemed adequate for those domains, too. The
Intuition facet, which was rejected as a valid source of information
about sets and numbers, might have been more acceptable in geometry
(e.g., something similar to Gelernter's model of a geometric
stuation).
All in all, then, we conclude that AM was fairly tough, and about as
general as its heuristics claimed it was. AM is not invincible, infallible, or
universal. Its strength lies in careful use of
heuristics. If there aren't enough domain-specific heuristics around,
the system will simply not perform well in that domain. If the
heuristic-using control structure of AM is tampered with$$ e.g.,
treat all reasons as equivalent, so you just COUNT the number of
reasons a task has, to determine its priority on the agenda.$,
there is some chance of losing vital guiding information which the
heuristics would otherwise supply.
. SSSEC(How to Perform Experiments on AM)
.ONCE TURN ON "{}"
The very fact that the kinds of experiments mentioned in the last
section (and described in detail in Section
{[2]EXPT}.{[2]EXPTSSEC},
page {[3]EXPTPAGE}) can easily be "set up" and performed
on AM, reflects a nice quality of the AM program.
Most of those experiments took only a few minutes to set up, only a
few tiny modifications to AM. For example, the one where all the
Worth ratings were intialized to the same value was done by evaluating
the single LISP expression:
.B816
(MAPC CONCEPTS '(λ (c) (PUT c 'Worth 200)))
.E
Similarly, here is how AM was modified to treat all tasks as if they
had equal value: the function Pick-task has a statement of the form
.B816
(SETQ Current-task (First-member-of Agenda))
.E
All that was necessary was to replace the call on the function
"⊗6First-member-of⊗*"$$ In LISP, this function is actually
abbreviated "CAR". $ by the function "⊗6Random-member-of⊗*".
Even the most sophisticated experiment, the introduction of a new
bunch of concepts -- those dealing with geometric notions like
Between, Angle, Line -- took only a few hours to set up.$$ I had been
thinking about this particular experiment off and on for several
months. When I started to code the new concepts, I'd probably already
thought out what they should be fairly clearly. $
There are certain experiments one can't easily perform on AM:
removing all its heuristics, for example. Most heuristic search
programs would then wallow around, displaying just how big their
search space really was. But AM would just sit there, since it'd have
nothing plausible to do.
Many other experiments, while cute and easy to set up, are quite
costly in terms of cpu time. For example, the class of experiments of
the form: "remove heuristics x, y, and z, and observe the resultant
affect on AM's behavior". This observation would entail running AM
for an hour or two of cpu time! Considering the number of subsets of
heuristics, not all these questions are going to get answered in our
universe's lifetime. Considering the small probable payoff from any one such
experiment, very few should actually be attempted.
Most of the experiments one could think of can be quickly set up
-- but only by someone familiar with the LISP code of AM.
It would be quite hard to modify AM so that the untrained user could
easily perform these experiments. Essentially, that would demand that
AM have a deep understanding of its own structure. This is of course
desirable, fascinating, challenging, but wasn't part of the design of
AM.
. SSSEC(Future Implications of this Project)
One harsh measure of AM would be to demand what possible applications
it will have. This really means (i) the uses for the AM system, (ii)
the uses for the ideas of how to create such systems, (iii)
conclusions about math and science one can draw from experiments with
AM.
Here are some of these implications, both real and potential:
<<MAYBE: Separate into 3 lists: definite/probable/potential
applications.>
.BN PREFACE 1 INDENT 0,3,0
λλ New tools for computer scientists who want to create large
knowledge-based systems to emulate any creative human activity.
.BEGIN PREFACE 0 INDENT 8
⊗61a.⊗* The modular representation of knowledge that AM uses might
prove to be effective in any knowledge-based system. Division of a
global problem into a multitude of small chunks, each of them of the
form of setting up one quite local "expert" on some concept, is a
nice way to make a hard task more managable. Conceivably, each
needed expert could be filled in by a human who really is an expert
on that topic. Then the global abilities of the system would be able
to rely on quite sophisticated local criteria. Fixing a set of
facets once and for all permits effective inter-module communication.
⊗61b.⊗* The ideas about heuristics having domains of applicability,
of tacking them onto the most general knowledge source (concept,
module) they are relevant to, rippling, etc., may all be carried over
unchanged into many fields of human creativity, wherever local
guiding rules exist.
.E
λλ A body of heuristics which can be built upon by others.
.BEGIN PREFACE 0 INDENT 8
⊗62a.⊗* Most of the particular heuristic judgmental criteria for
interestingness, utility, etc., might be valid in developing
theorizers in other sciences. Recall that each rule has its domain
of applicability; many of the heuristics in AM are quite general.
⊗62b.⊗* Just within the small domain in which AM already works, this base
of heuristics might be enlarged through contact with various
mathematicians. If they are willing to introspect and add some of
their "rules" to AM's existing base, it might gradually grow more and
more powerful.
.E
λλ New and better strategies for math educators.
.BEGIN PREFACE 0 INDENT 8
⊗63a.⊗* If the repertoire of intuition (simulated real-world
scenarios) were sufficient for AM to develop elementary concepts of
math, then educators should ensure that children (4-6 years old) are
thoroughly exposed to those scenarios. Such activities would include
seesaws, slides, piling marbles into pans of a balance scale,
comparing the heights of towers built out of cubical blocks, solving
a jigsaw puzzle, etc. Unfortunately, AM failed to show the value of
these few scenarios. This was a potential application which was not
confirmed.
⊗63b.⊗* Since the key to AM's success seems to be its heuristics, and
not the particular concepts it knows, the whole orientation of
mathematics education should be modified, to provide experiences for
the student which will build up these rules in his mind. Learning a
new theorem is worth much less than learning a new heuristic which
lets you discover new theorems.$$
Usually. One kind of exception is the following: the ability to take a powerful
theorem, and extract from it a new, powerful heuristic. AM cannot do this,
but it may turn out that this mechanism is quite crucial for humans' obtaining new
heuristics. This is another open research problem.
$ I am no doubt far from the first to
urge such a revision.
⊗63c.⊗* One use for AM itself would be as a "fun" teaching tool. If a
nice user interface is constructed, AM could serve as a model for,
say, college freshmen with no math research experience. They could
watch AM, see the kinds of things it does, play with it, and perhaps
get a real flavor for (and get turned on by) doing math research. A
vast number of brilliant minds are too turned off by high-school
drilling and college calculus to stick around long enough to find out
how exciting -- and different -- research math is compared to
textbook math.
.E
λλ Further experiments on AM might tell us something about how the
theory formation task changes as a theory grows in sophistication.
For example, can the same methods which lead AM from premathematical
concepts to arithmetic also lead AM from number systems up to
abstract algebra? Or are a new set of heuristic rules or extra
concepts required? My guess is that a few of each are lacking
currently, but ⊗4only⊗* a few. There is a great deal of disagreement
about this subject among mathematicians.
By tracing along the development of mathematics, one might categorize
discoveries by how easy they would be for an AM-like system to find.
Sometimes, a discovery required the invention of a brand new heuristic
rule, which would clearly be beyond AM as currently designed.
Sometimes, discovery is based on the lucky random combination of
existing concepts, for no good ⊗4a priori⊗* reason. It would be
instructive to find out how often this is necessarily the case: how
often ⊗4can't⊗* a mathematical discovery be motivated and "explained"
using heuristic rules of the kind AM possesses?
λλ An unanticipated result was the creation of new-to-Mankind math
(both directly and by defining new, interesting concepts to
investigate by hand). An even more exciting prospect, which never
materialized, was that AM would find a new redivision of existing
concepts, an alternate formulation of some established theory, much
like Hamiltonian mechanics is an alternate unification of the data
which led to Newtonian mechanics. Since AM is a collection of
heuristic rules, it could be augmented by several individuals. This
kind of system could then conceivably integrate their particular
insights, and do quite spectacular mathematics. This is still in the
realm of science fiction, of course.
.E
. SSSEC(Comparison to Other Systems)
<<Perhaps shorten this "comparison" section>
One popular way to judge a system is to compare it to other, similar
systems, and/or to others' proposed criteria for such systems. There
is virtually no similar project known to the author, despite an
exhausting$$
not exhaus↓_tive_↓; but weigh the Bibliography. $.
search
A couple tangential
efforts will be mentioned.
Several projects have been undertaken which overlap small pieces of
the AM system and in addition concentrate deeply upon some area ⊗4not⊗*
present in AM. Boyer and Moore's theorem-prover embodies some of
the spirit of AM (e.g., generalizing the definition of a
LISP function), but its motivations are quite different, its
knowledge base is minimal, and its methods purely formal.$$ This is
not meant as criticism; considering the goals of those researchers,
and the age of that system, their work is quite exciting. $ Badre's
CLET system worked on learning the decimal addition algorithm$$ Given
the addition table up to 10 + 10, plus an English text description of
what it means to carry, how and when to carry, etc., actually write a
program capable of adding two 3-digit numbers $ but the
"⊗4mathematics discovery⊗*" aspects of that system were neither
emphasized nor worth emphasizing; it was an interesting natural
language communication study. The same comment applies to several
related studies by IMSSS$$See [Smith], for example.$.
Many researchers have constructed programs which pioneered some of
the techniques AM uses$$ In many cases, those techniques were used
for the first time, hence were thought of as "tricks". $. Gelernter
used prototypical examples as analogic models to guide search in
geometry, and Bundy has used "sticks" to help his program work with
natural numbers. Kling has studied the single heuristic of analogy,
and Brotz has written a system which uses this to propose useful
lemmata; both of these computer programs are set up as theorem
provers, again not as discoverers.
As for theorem-⊗4provers⊗*, they are (to date) quite different from AM
in both dimensions of design and overall goal. As Bledsoe says$$
[Bledsoe 71], p. 73 $:
.BEGIN FILL INDENT 5,5,5 SELECT 3
There is a real difference between ⊗4doing⊗* some mathematics and ⊗4being⊗*
a mathematician. The difference is principally one of judgment: in the
selection of a problem (theorem to be proved); in determining its relevance;...
It is precisely in these areas that machine provers have been so lacking.
This kind of judgment has to be supplied by the user... Thus a crucial part
of the resolution proof is the ⊗4selection⊗* of the reference theorems by the
⊗4human⊗* user; the human, by this one action, usually employs more skill
than that used by the computer in the proof.
.END
One aspect that each of these systems lacked was size: they all
worked in tiny toy domains, with minuscule, carefully prearranged
knowledge bases, with just enough information to do the job well, but
not so much that the system might be swamped. AM is open to all the
advantages and all the dangers of a non-toy system with a massive$$
by contemporary standards. This data base will no doubt seem pitiful
a generation from now. $ corpus of data to manage. The other systems
did not often contain multiple representations for the same
knowledge, and they didn't rely on a large base of heuristic rules to
propose plausible tasks to work on, and their behavior was
problem-solving rather than theory formation. They were goal-driven
(the topmost goal being a desired theorem to prove). Certainly none
has considered the paradigm of ⊗4discovery and evaluation of the
interestingness of structure⊗*; the others have been "here is your
task, try and prove it," or, in Badre's case, "here is the answer,
try and translate/use it."
One exception to this will be Paul Cohen's math discovery system,
just entering the planning stages at the moment. The emphasis of that
program will be on formal proof, on random logical connections of
existing concepts (theorems) into new ones, interspersed with periods
of critical evaluation and pruning. Heuristic rules will not be
present to propose new concepts or future tasks,
but rather only to evaluate what has been done and prune away undesirable
new concepts.
This design is
similar to that of the early "learning by evolution" programs.
Theory formation systems in ⊗4any⊗* field have been few. Meta-dendral
represents the best of these. Its task is to unify a body of mass
spectral data, examples of "proper" identifications of spectra, into
a small body of rules for making identifications. Thus even this
system is given a fixed task, a fixed set of data to find
regularities within. AM, however, must find its own data, and take
the responsibility for managing its own time, for not looking too
long at worthless data (e.g., maximally-divisible numbers, which we
all know have no regularity!).
Besides "math systems", and "creative thinking systems", and "theory
formation systems", we should at least discuss others' thoughts on
the issue of algorithmically doing math research. Some individuals
feel it is not so far-fetched to imagine automating mathematical
research (e.g., Paul Cohen). Others (e.g., Polya) would probably
disagree.
There is very little thought about discovery in mathematics from an
algorithmic point of view; even clear thinkers like Polya and
Poincare' treat mathematical ability as a sacred, almost mystic
quality, tied to the unconscious. The writings of philosophers and
psychologists invariably attempt to examine human performance and
belief, which are far more manageable than creativity ⊗4in vitro⊗*.
Belief formulae in inductive logic (eg., Carnap, Hintikka,
Pietarinin) invariably fall back upon how well they fit human
measurements. The abilities of a computer and a brain are too
distinct to consider blindly working for results (let alone
algorithms!) one possesses which match those of the other.
.SSEC(Capabilities and Limitations of AM)
The first subsection contains a general discussion of what AM can and
can't do. Later subsections deal with powers and limitations inherent in using
an agenda scheme, in fixing the domain of AM, and in picking one specific
model of math research to build AM upon.
Finally, some speculation is made about the ultimate
powers and weaknesses of any systems which are designed very much like AM.
. SSSEC(Current Abilities)
⊗4What fields has AM worked in so far?⊗* AM is now able to explore a
small bit of the theory of sets, data types, numbers, and plane
geometry. It by no means has been fed -- nor has it rediscovered --
a large fraction of what is known in any of those fields. It might be
more accurate to be humble and restate those domains as: elementary
finite set theory, trivial observations about four kinds of data
types, arithmetic and elementary divisibility theory, and simple
relationships between lines, angles, and triangles. So a
sophisticated concept in each domain -- which was discovered by AM --
might be: de Morgan's laws, Delete-o-Insert$$
Take an item x, insert it into the structure B, then delete one occurrence
of x from B.
$ never alters Bags or
Lists, unique factorization, similar triangles.
⊗4Can AM work in a new field, like politics?⊗* AM can work in a new
elementary, formalized domain, if it is fed a supplemental base of
conceptual primitives for that domain. To work in plane geometry, it
sufficed to give AM about twenty new primitive concepts, each with a
few parts filled in. Another domain which AM could work in would be
elementary mechanics.
⊗4Can AM discover X? Why didn't it...?⊗* It is difficult to predict
whether AM will (without modifications) ever make a specific given
discovery. Although its capabilities are small, its limitations are
hazy. What makes the matter even worse is that, given a concept C
which AM missed discovering, there is probably a reasonable heuristic
rule which is missing from AM, which would enable that discovery. One
danger in this "debugging" is that a rule will be added which only
leads to that one desired discovery, and isn't good for anything
else. This must be avoided at all costs, ⊗4even at the cost of
intentionally giving up a certain discovery.⊗* If the needed rule is
general -- it has many applications and leads to many interesting
results -- then it really was an oversight not to include it in AM.
Although I believe that there are not too many such omissions still
within the small realm AM works, there is no objective way to
demonstrate that, except by further long tests with AM.
.ONCE TURN ON "{}"
⊗4In what ways are new concepts created?⊗* Although the answer to
this is accurately given in Section {[2]HEURS}.{[2]CREATECON}
(namely, this is mainly the jurisdiction of the right sides of
heuristic rules), and although I dislike the simple-minded way it
makes AM sound, the list below does characterize the major ways in
which new concepts get born:
.BN SELECT 6
Fill in examples of a concept (e.g., by instantiating its recursive
definition)
Create a generalization of a given concept (e.g., by weakening its
definition)
Create a specialization of a given concept (e.g., by restricting its
domain/range)
Compose two operations f,g, thereby creating a new one h
[h(x)≡f(g(x))]
Coalese an operation f into a new one g [g(x)≡f(x,x)]
Permute the order of the arguments of an operation [g(x,y)≡f(y,x)]
Invert an operation [g(x)=y iff f(y)=x] (e.g., from Squaring, create
Square-rooting)
Canonize one predicate P1 with respect to a more general one P2
[create a new concept f, an operation, such that: P2(x,y) iff
P1(f(x),f(y))]
Create a new operation g, which is the repeated application of an
existing operation f.
Trivial logical combinations of existing concepts x,y: ⊗6x∧y, x∨y,
¬x,⊗* etc.
.E
Below is a similar list, giving the primary ways in which AM
formulates new conjectures:
.BN SELECT 6
Notice that concept C1 is really an example of concept C2
Notice that concept C1 is really a specialization (or:
generalization) of C2
Notice that C1 is equal to C2; or: ⊗4almost always⊗* equal
Notice that C1 and C2 are related by some known concept (e.g., if
operation C3 is run on an argument which is a C1, then its value will
always be a C2)
If an analogy exists, try to extend it to the conjectures as well.
.E SELECT 1
In summary, we can say that AM has achieved its original purpose: to be
guided successfully by a large set of local heuristic rules, in the
discovery of new mathematical theories.
Besides creating new concepts and noticing conjectures, AM has the
key "ability" of appearing to decide rationally what to work on
at each moment. This is a result of the agenda of tasks --
containing associated reasons. Of course all of these abilities stem
from the quality and the quantity of local heuristic rules: little
plausible move generators and evaluators.
. SSSEC(Current Limitations)
Below are several shortcomings of AM, which hurt its behavior but are
not believed to be
inherent limitations of its design. They are presented in order
of decreasing severity.
Perhaps the most serious limitation on AM's current behavior arose
from the lack of constraints on left sides of heuristic rules. It
turned out that this excessive freedom made it difficult for AM to
inspect and analyze and synthesize its own heuristics; such a need
was not forseen at the time AM was designed. It was thought that the
power to manipulate heuristic rules was an ability which the author
must have, but which the system wouldn't require. As it turned out,
AM did successfully develop new concepts several levels deeper than
the ones it started with. But as the new concepts got further and
further away from those initial ones, they had fewer and fewer
specific heuristics filled in (since they had to be filled in by AM
itself). Gradually, AM found itself relying on heuristics which were
very general compared to the concepts it was dealing with (e.g.,
forced to use heuristics about Objects when dealing with Numbers).
Heuristics for dealing with heuristics do exist, and their number
could be increased. This is not an easy job: finding a new
meta-heuristic is a tough process. Heuristics are rarely more than
compiled hindsight; hence it's difficult to create new ones "before the fact".
.ONCE TURN ON "{}"
AM has no notion of Proof, proof techniques, formal validity,
heuristics for finding counterexamples, etc. Thus it never really
establishes any conjecture formally. This could probably be remedied
by adding about 25 new concepts (and their 100 new associated
heuristics) dealing with such topics. The needed concepts have been
outlined on paper (and are present in Appendix {[2]ALLCON}). It would
probably require a few hundred hours to code and debug them.
The user interface is quite primitive, and this again could be
dramatically improved with just a couple hundred hours' work. AM's
explanation system is almost nonexistent: the user must ask a
question quickly, or AM will have already destroyed the information
needed to construct an answer. A clean record of recent system
history and a nice scheme for tracking down reasons does not exist at
the level which is found, e.g., in MYCIN. There is no trivial way to
have the system print out its heuristics in a format which is
intelligible to the untrained user.
An important type of analogy which was untapped by AM was that
between heuristics. If two situations were similar, conceivably the
heuristics useful in one situation might be useful (or have useful
analogues) in the new situation. Perhaps this is a viable way of
enlarging the known heuristics. Such "meta-level" activities were
kept to a minimum throughout AM, and this proved to be a serious
limitation. My intuition tells me that the "right" ten meta-rules
could correct this particular deficency.
The idea of "Intuitions" facets was a flop.
Intuitions were meant to model reality, at least little pieces of it,
so that AM could perform (simulate) physical experiments, and observe
the results. The major problem here was that ⊗4so⊗* little of the world was
modelled that the only relationships derivable were those foreseen by the
author. This lack of generality was unacceptable, and the intuitions were
completely excised. The original idea might lead somewhere if it were
developed fully. As with all limitations of AM, I leave this as an open sugestion
for future research.
Several limitations arose from the constraints of the agenda scheme,
from the choice of finite set theory as the domain to work in, and
from the particuar model of math research that was postulated. These
will be discussed in the next 3
subsections.
. SSSEC(Limitations of the Agenda scheme)
.COMMENT Should these DIFlabels go in the sec's header?;
.DIFSECNUM: SECNUM;
.DSEC: SECNUM;
.COMMENT Not sure about that last label-- it might not be meant for
.here;
.DIFSSECNUM: SSECNUM;
.DIFSSEC: SSECNUM;
.DIFSEC: PAGE;
The following quibbles with the agenda scheme get less and less
important. When you get bored, skip to the next subsection.
Currently, it is difficult to include heuristics which interact with
one another in any significant way. The whole fibre of the Agenda
scheme assumes perfect independence of heuristics. The global formula
used to rate tasks on the agenda assumes perfect superposition of
reasons: there are no "cross-terms". Is this assumption always
valid? Unfortunately ⊗4no⊗*, not even for the limited domain AM has
explored. Sometimes, two reasons are very similar: "Examples of Sets
would permit finding examples of Union" and "Examples of Sets would
permit finding examples of Intersection". In that case, their two
ratings shouldn't cause such a big increase in the overall priority
value of the task "⊗6Fillin examples of Sets⊗*".
Sometimes, a heuristic rule will want to ⊗4dissuade⊗* the system from
some activity. Thus a ⊗4negative⊗* numeric contribution to a task's
priority value is desired. This is not figured into the current
scheme. With a slight modification, the global formula could preserve
the sign (signum) of each reason's rating.
Tasks on the agenda list are ordered by their numeric priority value.
Each reason's numeric value is kept, too. When new reasons are
added, these values are used to recompute a new priority for the
task. Each reason's rating was computed by a little formula found
inside some heuristic rule. Those formulae are not kept hanging
around. One big improvement in apparent intelligence could be
attained by tacking on those little formulae to the reasons. When a
new reason is added, the old reasons' rating formulae would be
evaluated again. They might indeed give new numbers. For example,
suppose one reason was "Few examples of X are known". But by now,
other tasks have meanwhile inadvertantly filled in several examples
of X. Then that little reason's formula would come up with a much
lower value than it did originally. In fact, the value might be so
low that the reason was dropped altogether. If the formulae were
kept, it might be good practice to evaluate them for the top two or
three tasks on the agenda, to see if they might change their
ordering. Also, the top task's priority would then be more accurate,
and recall that its value is used to determine the cpu time and list
cell space quanta that the task is allowed to use up. At the moment,
AM is not set up to store the little functions, and if modified to do
so, it uses up a lot more space than it can afford. Also, the top
few jobs are almost never semantically coupled (except by "focus of
attention"), so the precise order in which they are executed rarely
matters.
Perhaps what is needed is not a single priority value for each task,
but a vector of numbers. At each cycle, AM would construct a vector
of its current "interests" and needs, and each task's vector would be
dot-multiplied against this global vector of AM's desires. The
highest scorer would then be chosen. For example, one dimension of
the rating could be "safety", and one could be "best possible
payoff", one could be "average expected payoff", etc. Sometimes, AM
would have to break out of a stagnant situation, and it would be
willing to try riskier tasks than ususal. This was not implemented
because of the great increase in cpu time it would cause. It is,
however, probably a better design than the current one. Even more
intelligent schemes can be envisioned -- involving more and more
symbolic data being stored with each task. Ultimately, this would be
just the English reasons themselves; by that time, the task-orderer
would have grown into an incredibly complex AI program itself (a
natural language program plus an interrelator plus...).
The agenda list should really be an agenda ⊗4tree⊗*$$
maybe an agenda ↓_Heap_↓.
$, since the
ordering of tasks is really just partial, not total. If this is
clear, then skip the rest of this paragraph. There are some
"legitimate" orderings of tasks on the agenda; if task X is supported
by a subset of the reasons which support Y, then clearly the priority
of X should be less than or equal to the priority of Y. Two tasks
of the form "Fillin examples of A", "Fill in examples of B" can be
ordered simply because A is currently much more interesting than B.
But often, two tasks will have no ironclad ordering between them:
compare "Fillin examples of Sets" and "Check generalizations of
Union". Thus the ordering is only partial, and it is the artifice of
the global evaluation function which embeds this into a linear
ordering. If multiprocessors are used, it might be advantageous to
keep the original partial ordering around.
. SSSEC(Choice of Domain)
Here are some questions this subsection will address:
.BN
⊗8# ⊗* What are the inherent limitations -- and advantages -- in
fixing a domain for AM to work in?
⊗8# ⊗* What characterisitcs are favorable to automating research in
any given domain?
⊗8# ⊗* What are the specific reasons for and against elementary
finite set theory as the chosen starting domain?
.E
Research in various domains of science and math proceeds slightly
differently. For example, psychology is interested in explaining
people, not in creating new kinds of people. Math is not interested
in individual entities so much as in new kinds of entities. There are
ethical restrictions on physicians which prevent certain experiments
from being done. Political experiments rarely permit backtracking,
etc. Each field has its own peculiarities.
If we want a system to work in many domains, we'd have to sacrifice
some power.$$ This is assuming a system of a given fixed size. If
this restriction isn't present, then a reasonable "general-purpose"
system could be built as several systems linked by one giant
switch.$. Within a given field of knowlege (like math), the finer the
category we limit ourselves to, the more specific are the heuristics
which become available. So it was reasonable to make this first
attempt limited to one narrow domain.
This brings up the choice of domain. What should it be? As the
DENDRAL project illustrated so clearly$$ see [Buchanan, Feigenbaum,
et. al.]. Choice of subject was enabled by J. Lederberg in 1965. $,
choice of subject domain is quite important when studying how
researchers discover and develop their theories. Mathematics was
chosen as the domain of this investigation, because
.BN
λλ In doing math research, one needn't cope with the uncertainties
and fallability of testing equipment; that is, there are no
uncertainties in the data (compared to, e.g., chemical identification
from mass spectrograms).
λλ Reliance on experts' introspections is one of the most powerful
techniques for codifying the judgmental criteria necessary to do
effective work in a field; I personally have had enough training in
elementary mathematics so that I didn't have to rely completely on
external sources for guidance in formulating such heuristic rules.
Also, several excellent sources were available [Polya, Skemp,
Hadamard, Kershner, etc.].
λλ A hard science is of course easier to work in than a soft one; to
automate research in psychology would require more knowledge about
human information processing than now is known, because psychology
deals with entities as complex as you and I. Also, in a hard science,
the ⊗4languages⊗* to communicate information can be simple even
though the ⊗4messages⊗* themselves be sophisticated.
λλ Since mathematics can deal with any conceivable constructs, a
researcher there is not limited to explaining observed data. Related
to this is the freedom to investigate -- or to give up on -- whatever
the researcher wants to. There is no single discovery which is the
"goal", no given problem to solve, no right or wrong behavior.
λλ Unlike "simpler" fields, such as propositional logic, there is an
abundance of heuristic rules available for the picking.
.E
The limitations of math as a domain are closely intertwined with its
advantages. Having no ties to real-world data can be viewed as a
limitation, as can having no clear goal. There is always the danger
that AM will give up on each theory as soon as the first tough
obstacle crops up.
Since math has been worked on for millenia by some of the greatest
minds from many different cultures, it is unlikely that a small
effort like AM would make any new inroads, have any startling
insights. In that respect, Dendral's task was a lot more promising:
humans had as yet not studied all isomers of common cyclic organic
molecules in a systematic way (they are too numerous). Of course
math -- even at the elementary level that AM explored it -- still has
undiscovered gems (e.g., the recent unearthing of Conway's numbers [Knuth 74]).
As Lederberg commented$$ Joshua Lederberg's review of Joseph
Weizenbaum's book,...$, AI can succeed in automating an activity only
when a "strong theory" of that activity exists. AM is built on a
detailed model of how humans do math research. In the next
subsection, we'll discuss the model of math research that AM assumes.
Before that, consider for a moment which other fields of human
endeavor have a good model, and also enjoy all the advantages listed
above: Other domains of math, classical physics, ..?.. -- not too
many.
. SSSEC(Limitations of the Model of Math Research)
AM, like anything else in this world, is constrained by a mass of
assumptions. Most of these are "compiled" or interwoven into the very
fabric of AM, hence can't be tested by experiments on AM. AM is
built around a particular model of how mathematicians actually go
about doing their research. This model was derived from
introspection, but can be supported by quotes from Polya, Kershner,
Hadamard, Skemp, and many others. No attempt will be made to justify
any of these premises. Here is a simplified summary of that
information processing model for math theory formation:
.ONCE NARROW 2,2 SELECT 8 TURN OFF "α" TURN ON "∞→"
⊂∞α→⊃
.MODELPAGE: PAGE;
.MODELSSEC: SSECNUM;
.MODELSSSEC: SSSECNUM;
.MODELSEC: SECNUM;
.BREAK; BN NARROW 2,2 SINGLE SPACE PREFACE 0
λλ The order in which a math textbook presents a theory is almost the
exact opposite of the order in which it was actually discovered and
developed. In a text, new definitions are stated with little or no
motivation, and they turn out to be just the ones needed to state the
next big theorem, whose proof then magically appears. In contrast, a
mathematician doing research will examine some already-known
concepts, perhaps trying to find some regularity in experimental data
involving them. The patterns he notices are the conjectures he must
investigate further, and these relationships directly motivate him to
make new definitions.
λλ Each step the researcher takes while developing a new theory
involves choosing from a large set of "legal" alternatives -- that
is, searching. The key to keeping this from becoming a blind,
explosive search is the proper use of evaluation criteria. Each
mathematician uses his own personal heuristics to choose the "best"
alternative available.
λλ Non-formal criteria (aesthetic interestingness, inductive
inference from empirical evidence, analogy, and utility) are much
more important than formal deductive methods in developing
mathematically worthwhile theories, and in avoiding barren
diversions.
λλ Progress in ⊗4any⊗* field of mathematics demands much non-formal
heuristic expertise in ⊗4many⊗* different "nearby" mathematical
fields. So a broad, universal core of knowledge must be mastered
before any single theory can meaningfully be developed.
λλ It is sufficient (and pragmatically necessary) to have and use a
large set of informal heuristic rules. These rules direct the
researcher's next activities, depending on the current situation he
is in. These rules can be assumed to superimpose ideally: the
combined effect of several rules is just the sum of the individual
effects.
λλ The necessary heuristic rules are virtually the same in all
branches of mathematics, and at all levels of sophistication. Each
specialized field will have some of its own heuristics; those are
normally much more powerful than the general-purpose heuristics.
λλ For true understanding, the researcher should grasp$$
Have access
to, relate to, store, be able to manipulate, be able to answer
questions about $ each concept in several ways: declaratively,
abstractly, operationally, knowing when it is relevant, and as a
bunch of examples.
λλ Common metaphysical assumptions about nature and science: Nature
is fair, uniform, and regular. Coincidences have meaning.
Statistical considerations are valid when looking at mathematical
data. <<Perhaps incorporate some of Iberall's metaphysics here.>
.WIDEN 2,2
.ONCE SELECT 8 TURN OFF "α" TURN ON "∞→"
%∞α→$
.END
. SSSEC(Ultimate powers and weaknesses)
Consider now ⊗4any⊗* system which is consistent with the preceding
model of math research, and whose orientation is to discover and
develop new (to the system) mathematical theories. This includes AM
itself, but might also include a bright high-school senior who has
been taught a large body of heuristic rules.
What can such systems ultimately achieve? What are their ultimate
limits? Answers to ultimate questions are hard to come by
experimentally, so this discussion will be quite philosophical,
emotional, and short. The model of math research hinges around the
use of heuristic rules for guidance at all levels of behavior. It is
questionable whether or not all known mathematics could evolve
smoothly in this way. As a first order fixup, we've mentioned the
need to provide good meta-heuristics, to keep enlarging the set of
heuristics. If this is not enough (if meta-meta-...-heuristics are
needed), then the model is a poor one and has some inherent
limitations. If some discoveries can only be made irrationally, by
random chance, then any such system would be incapable of finding
those concepts.
Turning aside from math, what about systems whose design -- as a
computer program -- is similar to AM?$$ Having an agenda of tasks
with reasons and reason-ratings combining to form a global priority
for each task, having units/modules/frames/Beings/Actors/concepts
which have parts/slots/facets, etc. Heuristic rules are tacked onto
relevant concepts, and are executed to produce new concepts, new
tasks, new facet entries. $ Building such systems will be "fun", and
perhaps will result in new discoveries in other fields.
Eventually, scientists (at least in a few very hard domains) may
relegate more and more of their "hack" research duties to AM-like
systems. The ultimate limitations will be those arising from
incorrect (e.g., partial) models of the activities the system must
perform. The systems themselves may help improve these models:
experiments that are performed on the systems are actually tests of
the underlying model; the results might cause revisions to be made in
the model, then in the system, and the whole cycle would begin again.
.SSEC(Final Conclusions)
<<This "final conclusions" section is not written yet.>
This is the final section in the body of the thesis.
It should summarize the entire project, emphasizing the
overall importance (contribution to knowledge), the
interesting details worth remembering (agenda with rules, a couple
new bits of knowledge), summarizing judgments, etc.
Perhaps: Segway into Appendix 1 [model of scientific research as heur search].